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Final  Report  of  Office  of  Naval  Research  Contract, 
Designing  and  Implementing  an  Intelligent  Multimedia  Tutoring 

System  for  Repair  Tasks 
(N00014-85-K-0060) 

The  purpose  of  this  report  is  to  describe  a  research  project  which 
began  in  September  1984  at  the  University  of  Colorado  and  ended  in  April 
1989  at  the  University  of  Michigan.  There  were  three  main  phases  in  the 
work,  two  at  Colorado  and  one  at  Michigan.  The  phases  were: 

1 .  Developing  an  interactive  computer-controlled  videodisc-based 

system  to  help  people  learn  to  assemble  an  object,  and  testing  how 
people  use  it. 

2.  Designing  and  implementing  a  prototype  "intelligent" 

multimedia  tutoring  system,  again  videodisc-based,  to  help  people 
assemble,  repair, and  understand  an  object,  and  testing  how  people, 
given  different  tasks,  use  it. 

3.  Developing  a  graphics-based  system  to  help  people  repair  an  object, 

and  testing  how  people  use  several  versions  of  it. 


For  each  phase,  a  summary  of  the  work  will  be  given  generally  as 
follows:  the  problem  or  goal  will  be  described,  and  then  the  approach 
(theoretical  and  practical),  the  equipment  and  implementation, 
experimental  work,  and  results/conclusions/new  questions/evaluation 
of  approach.  Publications  and  talks  on  the  research  :.  e  listed  in  an 
appendix  at  the  end.  Attached  to  this  report  is  a  new  t-.  ical  report 
(Baggett,  Ehrenfeucht,  &  Guzdial,  1989)  describing  in  detail  the  main 
study  from  phase  3,  the  Michigan  phase,  so  that  phase  will  be  described 
only  very  briefly  in  this  report. 

Phase  1 .  Videodisc-based  procedural  instructions. 

Problem/goal  and  approach. 

When  the  project  first  began,  we  had  been  working  with  film  and 
video  instruction.  Videodisc  was  a  new  medium  for  us,  and  there  was 
not  much  equipment  available  for  combining  computerized  information 
and  videodisc  images  in  one  presentation.  The  first  phase  of  the  work  had 
a  modest  goal:  to  develop  interactive  videodisc-based  instructions,  under 
computer  control,  that  help  people  assemble  an  object.  The  instructions 
were  not  meant  to  be  "intelligent"  in  the  sense  of  Sleeman  &  Brown 
(1982).  There  was  to  be,  for  example,  only  one  level  of  instruction  for 
all  subjects,  and  no  error  diagnosis.  But  from  our  earlier  ONR-sponsored 
work  in  film  and  video,  we  had  learned  how  to  derive  a  "natural" 
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conceptualization  (breakdown  of  the  object  into  parts  and  subparts),  and 
to  find  names  for  parts  that  were  short,  easily  matched  with  their 
physical  referents,  and  fairly  well  recalled.  We  used  these  old 
techniques  in  developing  the  new  instructions.  In  particular,  we  had 
developed  a  videotape  showing  assembly  of  an  80-piece  object,  called  a 
lift,  made  from  the  Fischer-Technik  assembly  kit.  It  contained  the 
"natural"  conceptualization  and  names,  so  we  pressed  a  videodisc  from 
that  videotape.  We  then  needed  to  develop  instructions  under  computer 
control. 


At  the  beginning  of  the  project  we  looked  for  appropriate  equipment 
on  which  to  do  our  implementation.  Lowry  Air  Force  Base  in  Denver 
provided  us  with  a  Triads  system,  one  of  only  three  or  four  ever  made.  It 
was  a  two-screen  system  (videodisc  images  on  one  monitor  and 
computer  information  on  another).  The  system  was  unique  and  very 
nonstandard,  and  the  documentation  poor.  It  eventually  broke  down,  and 
since  there  was  very  little  likelihood  of  making  it  operable  again,  we 
enlisted  the  help  of  IBM-Boulder.  Phil  Smith,  the  inventor  of  the  IBM 
InfoWindow  system,  provided  us  with  a  prototype  of  his  system 
(XT-based,  with  a  special  monitor  and  Pioneer  LDV-6000  videodisc 
player),  complete  with  a  beta  version  of  the  Composer/Conductor 
software  needed  to  design  presentations.  (InfoWindow  did  not  go  on  the 
market  until  about  1986,  and  this  was  1984.) 

With  Phil  Smith  and  the  IBM  Advanced  Educational  Systems  group  in 
Atlanta  as  consultants,  we  were  able  to  develop  our  first  presentation 
and  help  IBM  debug  its  software.  Input  to  the  system  could  be  via 
keyboard  and  touchscreen;  output  was  moving  and  still  video  with  text 
and/or  color  graphics  overlay.  There  were  three  sources  of  speech:  two 
from  the  two  videodisc  soundtracks  and  one  from  a  limited  speech 
synthesizer. 

The  optical  videodisc  required  by  InfoWindow  contains  up  to  54,000 
frames,  each  with  its  own  address.  The  disc  contains  up  to  30  min  of 
playing  time,  displaying  30  frames/sec.  On  our  equipment  the  access 
time  from  one  frame  to  any  other  is  approximately  1 .6  sec  maximum. 

Our  videodisc,  pressed  from  the  videotape  mentioned  above,  was  27  min 
long. 

Design  of  the  instructions  for  assembly  of  the  lift  is  shown  in 
Figure  1 .  It  was  implemented  by  Jeffery  Weiss  on  an  IBM  XT-PC,  using 
videodisc  and  IBM's  Composer/Conductor  software  (since  renamed 


InfoWindow  Presentation  System  software). 
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Experimental  work/new  questions. 

Our  first  goal  in  the  experimental  work  was  to  show  something  that 
we  thought  was  obvious:  that  people  given  interactive  control  over 
instructions  and  the  ability  to  practice  (i.e.,  actually  perform  the 
assembly)  while  they  viewed  would  be  able  to  perform  an  assembly  task 
later  from  memory  better  than  people  who  simply  viewed  videotape 
instructions  without  interactive  control  and  without  practicing.  (Note: 

By  practice  we  will  mean  actually  performing  the  assembly  during 
instruction.)  What  we  actually  found,  after  running  64  subjects,  was  not 
what  we  expected.  We  found  that  there  was  no  difference  in  the  two 
groups  in  performing  the  assembly  from  memory,  in  terms  of  structural 
correctness,  functionality,  or  efficiency  (correctness  divided  by  total 
time  to  work).  (Details  are  in  Baggett,  1988.)  A  side  remark  is  that 
there  were  also  no  gender  differences,  as  we  will  discuss  below. 

To  try  to  pin  down  why  interactive  instruction  with  building  on-line 
does  not  lead  to  better  memory  performance  than  passive  video,  we  first 
hypothesized  that  the  interactive  group  had  a  dual  motoric  task:  build  the 
lift  and  operate  the  touch  screen.  We  thought  that  perhaps  operating  the 
screen  interferes  with  learning  to  build.  So  we  reasoned  that  people  who 
are  given  interactive  instructions  and  not  allowed  (during  training)  to 
practice  would  perform  even  worse  from  memory  than  either  of  the 
already  tested  groups.  We  tested  such  a  group,  and  to  our  surprise,  it 
performed  no  worse  on  the  structural  and  functional  measures,  and 
significantly  better  on  the  efficiency  measure  than  the  first  two  groups. 
Clearl  '  we  were  guessing  wrong  about  the  role  of  practice  (motoric 
actions)  in  concept  formation.  We  had  begun  the  research  with  a 
theoretical  model  which  assumed  that  motoric,  visual,  and  verbal 
elements  are  integrated  together  into  a  single  concept.  But  our  results 
indicated  that  the  motoric  component  seems  to  stand  alone.  We  have 
hypothesized  a  modified  framework  which  says  that  learning  consists  of 
two  elements:  understanding  (a  cognitive  process)  and  skill  acquisiton  (a 
noncognitive  process).  Understanding  involves  forming  and  modifying 
concepts.  Skill  acquisition  comes  through  practice.  Understanding  is 
analogous  to  forming  an  algorithm,  and  practice  is  analogous  to 
executing  the  algorithm.  Most  typically  when  one  executes  an  algorithm 
one  does  not  increase  one’s  understanding  (unless  during  the  execution 
one  is  noticing  something  new  or  debugging  the  algorithm,  i.e.,  checking 
that  it  is  okay).  Rather,  the  primary  role  of  practice  is  to  speed  up  the 
algorithm's  execution.  In  our  experiment,  the  group  which  practiced 
on-line  during  instruction  apparently  did  not  form  a  less  buggy  algorithm 
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than  the  group  which  did  not  practice.  But  they  did  perform  the  memory 
trial  faster  (50  min)  than  the  non-practice  group  (43  min). 

Continuing  to  try  to  pin  down  the  role  of  practice,  we  tested  a 
fourth  group  whose  situation  was  similar  to  that  of  the  group  that 
practiced  during  instruction,  but  with  one  difference.  During 
instruction,  when  they  wanted  to  perform  any  part  of  the  assembly,  they 
had  to  indicate  this  by  touching  the  word  "build"  on  the  screen.  The 
screen  would  then  turn  black,  with  the  word  "return"  in  the  corner.  If  the 
subject  touched  "return,"  the  black  screen  was  replaced  by  a  still  frame 
of  the  image  that  had  been  present  just  before  "build"  was  touched. 

The  purpose  of  the  "black  screen"  experiment  was  to  check  whether 
the  decrement  in  the  interactive-build  group  was  one  of  divided  visual 
attention,  namely,  having  to  watch  one's  hands  and  the  screen 
simultaneously.  With  the  black  sc;  een  presentation,  one's  attention  is 
not  divided:  one  watches  either  one's  hands  or  the  screen.  Further,  the 
viewer  actually  performs  a  memory  trial,  broken  as  he  or  she  wishes 
into  small  pieces,  during  the  instructions.  That  is,  he  or  she  is  allowed 
to  work  only  from  memory,  with  just  the  black  screen  present;  there  is 
no  out-and-out  copying  or  mimicking  (see  also  Palmiter,  Elkerton,  & 
Baggett,  in  press). 

Results  on  the  memory  trial  for  the  black  screen  group  were 
significantly  different  for  males  and  females,  the  first  gender 
difference  in  the  study.  On  structural  correctness  males  scored  highest 
of  any  group,  but  not  significantly  higher  than  the  interactive-no  build 
males.  Females  scored  lowest  of  any  group  but  not  significantly  lower 
than  the  interactive-build  females  or  the  passive  video  females. 

Combining  results  from  both  genders,  the  black  screen  presentation 
leads  to  slightly  but  not  significantly  worse  performance  than  the 
interactive-no  build  presentation.  Thus  far  we  have  not  found  a 
presentation  condition  in  which  practice  (actually  performing  the 
assembly)  is  included  in  instruction  and  performance  on  a  later  memory 
trial  yields  significantly  better  structural  or  efficiency  scores  than 
when  practice  is  not  included. 

Our  final  manipulation  to  look  at  the  effect  of  practice  on 
performing  a  procedure  from  memory  placed  a  7-day  delay  between 
training  and  test  for  two  interactive  groups:  one  which  practiced  during 
training  and  one  which  did  not.  We  thought  that,  even  if  practice  during 
training  did  not  help  performance  from  memory  when  one  wp.s  tested 
immediately  after  training,  we  might  see  a  positive  effect  of  practice 
with  a  delay  between  training  and  test.  Once  again  the  results  showed 
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that  we  were  wrong.  Combining  data  from  males  and  females,  there 
were  no  differences  in  structural  or  efficiency  performance  between 
those  who  practiced  during  training  and  those  who  did  not.  (While  we  did 
not  perform  an  independent  statistical  analysis  with  gender  as  a 
variable,  it  appears  that  building  during  training  actually  helped  our 
(novice)  female  subjects,  while  it  made  no  difference  to  our  male 
subjects.) 

This  experimental  work  brings  new  questions  about  practice. 

Exactly  what  characterizes  it,  and  what  is  its  role  in  learning  a 
procedure  in  which  the  motoric  elements  required  for  the  task  are 
actually  known  to  people  (everybody  can  join  together  two  blocks)?  At 
the  end  of  this  report  we  will  discuss  an  experiment  currently  being 
designed  in  our  lab  which  will  attempt  to  look  at  these  questions. 

Before  turning  to  phase  2,  the  lack  of  a  gender  difference  on  most 
of  our  assembly  measures  deserves  comment.  In  some  of  our  previous 
work  on  assembly  (e.g.  Baggett  &  Ehrenfeucht,  1988)  a  gender  difference 
was  our  largest  effect.  We  also  have  data  to  show  that  females 
subjectively  rate  themselves  as  novices  in  assembly,  while  males  rate 
themselves  considerably  higher.  A  key  element  in  making  the  gender 
difference  disappear  is  a  change  in  what  is  actually  shown  in  the 
instructions  (Baggett  &  Ehrenfeucht,  in  progress).  Females  performed  as 
well  as  males  when  two  things  occurred:  the  video  image  was  actually 
two  images:  one  showing  the  (current)  goal,  e.g.,  a  completed 
subassembly;  and  the  second  showing  hands  working  toward  the  goal;  and 
the  instructions  were  shown  in  a  step-by-step  procedure,  rather  than 
top-down  breadth  first.  Take  away  the  goal,  and  present  instructions  in 
a  non-executable  order,  and  performance  by  females  (but  not  by  males) 
falls.  When  the  goal  is  present  and  the  order  is  step-by-step,  we 
hypothesize  that  working  memory  is  relieved  of  some  of  its  load.  We 
find  these  results  intriguing  but  would  like  to  see  them  replicated. 

Phase  2.  Designing  and  implementing  a  prototype  "intelligent" 

multimedia  tutoring  system 

PiobtemZoQai  ancLapproach, 

In  the  work  proposed  for  ONR  we  put  forth  an  example  of  a  new 
multimedia  knowledge  representation  to  be  implemented  as  a  data 
structure  for  a  tutoring  system  for  assembly,  repair,  and  understanding 
of  real  physical  objects.  The  knowledge  representation  and  the 
processes  that  work  on  it  are  an  embodiment  of  our  hypotheses  about 
how  people  represent  and  process  information.  The  ideas  for  the 
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knowledge  representation  (data  structure)  were  an  extension  of  our 
previous  work  in  assembly;  in  the  old  work  (e.g.,  above),  the  data 
structure  consisted  of  nodes  representing  pieces  of  the  object  and  links 
representing  physical  connections.  The  new  data  structure  incorporated 
some  new  kinds  of  nodes,  those  indicating  actions  to  assemble  or 
disassemble  pieces,  those  indicating  names  (for  pieces,  subassemblies, 
and  actions),  and  abstract  nodes  indicating  circuitry,  functionality,  and 
structure.  Some  of  the  different  node  types  correspond  to  different 
modalities  in  our  multimedia  theoretical  framework:  action  nodes  are 
motoric;  piece  and  subassembly  nodes  are  visual,  and  name  nodes  are 
verbal.  In  addition,  the  new  data  structure  incorporated  two 
(directional)  link  types,  one  to  be  interpreted  as  subconcept,  and  the 
other  as  causality  or  expectation.  (Details  of  the  data  structure  are 
given  in  Baggett,  Ehrenfeucht,  &  Hanna,  1987.) 

The  goals  of  phase  2  were  to  choose  a  reasonably  complex  object;  to 
implement  the  data  structure  for  the  object,  using  videodisc;  to  design 
and  implement  an  easily  usable  interface,  and  to  test  how  people  used 
the  system  to  assemble,  repair,  and  understand  the  object. 


Before  implementing  the  data  structure  for  a  relatively  complex 
object,  we  did  two  implementations  (verbal  part  only;  no  videodisc)  for 
the  simple  flashlight  given  in  the  original  ONR  proposal.  Mike  Perry  did 
an  implementation  in  Lisp  and  John  Hanna  did  one  in  C,  both  on  a  VAX 
11/780.  We  were  encouraged  by  the  results.  When  the  user  asked  a 
question  (in  a  very  constrained  way),  the  graph  was  processed,  and  the 
"answer"  to  the  query  was  presented  (verbally)  to  the  user,  based  on  a 
particular  graph  traversal  found  as  a  result  of  the  query.  For  example, 
when  the  user  queried,  "How  do  I  remove  the  bulb?"  the  reply  was, 
"Unscrew  the  cap.  Tilt  and  remove  the  reflector  from  the  front  part. 

Take  out  the  bulb."  To  our  surprise  (and  amusement),  when  the  user 
asked  of  the  Lisp  implementation,  "How  do  I  remove  the  battery  from  the 
bulb?"  it  replied,  "Put  the  bulb  in  the  reflector.  Place  the  reflector  in 
the  top  part.  Screw  on  the  cap.  Unscrew  the  cap.  Remove  the  batteries 
from  the  case."  What  it  did  in  terms  of  the  graph  was  go  "up"  the  graph, 
building  the  whole  flashlight  from  the  bulb,  and  then  go  "down"  the  graph 
to  the  batteries.  (We  changed  the  graph,  based  on  this  answer,  so  that  in 
a  later  version  its  response  was,  "It  cannot  be  done.") 

We  chose  a  40-piece  object,  a  string  crawler,  for  the  complex 
implementation.  Made  from  the  Capsela  assembly  kit,  it  is  a 
battery-powered  object  which  travels  forward  or  backward  along  a 
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string  when  turned  on.  It  is  shown  in  Figure  1  of  the  attached  technical 
report.  Stages  in  the  tutor's  development  included: 

1 .  Using  our  old  techniques,  we  derived  its  "natural"  conceptualization 
and  short  simple  names  for  its  parts  and  subassemblies.  We  expanded 
this  conceptualization  to  one  including  actions  (for  the  motoric  nodes  in 
our  graph). 

2.  For  designing  linguistic  access  (for  a  subject  using  a  keyboard),  we 
used  the  naming  data  collected  in  (1)  and  found  short  unique  character 
strings  which  would  be  used  as  access  keys  to  various  parts  of  the  data 
structure.  This  technique  is  given  in  detail  in  Baggett,  Ehrenfeucht,  & 
Perry  (1986). 

3.  We  designed  the  multimedia  graph  to  be  used  for  the  data  structure. 
This  step  is  analogous  to  the  step  in  a  production  system  model  where 
the  system  of  productions  is  written  by  a  person.  (The  evaluation  about 
whether  the  graph  was  correct  was  to  be  based  on  performance  of  the 
system  when  using  that  graph.) 

4.  We  shot  a  videotape  containing  images  of  all  string  crawler  parts  and 
subassemblies,  and  of  the  actions  of  assembling  and  disassembling  it 
according  to  the  "natural"  conceptualization.  An  image  was  shot  for  each 
visual  node  in  the  graph  developed  in  (3).  Each  image  on  the  videotape 
was  to  be  used  in  many  different  tasks  (many  different  graph 
traversals).  A  small  number  of  images  (less  than  30  min  of  video)  thus 
was  meant  to  ccver  a  huge  amount  cf  graph  processing.  The  videotape 
was  narrated  using  terminology  derived  in  (1).  This  tape  was  pressed 
into  a  videodisc.  A  side  comment  is  the  following.  An  interesting 
problem  arose  in  trying  to  shoot  an  image  (e.g.  an  action)  that  would  fit 
in  many  different  contexts  (graph  traversals).  In  (linear)  film,  one  image 
has  only  one  context,  so  the  problem  of  pictorial  continuity  is  easily 
solved.  But  here  one  action  image  could  be  the  predecessor  and  the 
successor  of  many  different  images.  We  fairly  successfully  solved  the 
problem  as  follows.  Each  action  was  shot  as  a  sequence  of  three  images: 
medium  shot,  extreme  close-up,  medium  shot.  When  a  particular  image 
was  the  first  in  a  graph  traversal  series,  the  program  would  select 
medium  followed  by  close-up.  When  it  was  in  the  middle,  only  the 
close-up  was  selected.  And  when  at  the  end,  the  program  selected  the 
close-up  followed  by  the  medium  shot. 

5.  The  data  structure  was  implemented  in  C  on  a  VAX  1 1/780  by  John 
Hanna.  (Specifications  were  written  by  Rob  Favero.)  IBM  iater  gave  us 
an  RT,  and  the  implementation  was  moved  to  it.  The  data  structure  was 
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designed  to  be  able  to  "answer"  the  foliowing  types  of  queries:  (a)  Show 
me  X  (for  any  piece  or  subassembly  in  the  string  crawler),  (b)  How  do  I 
remove/replace  X?  (c)  Why  is  (or  is  not)  something  the  case?  (For 
example,  Why  doesn't  the  chain  move?)  (d)  How  does  the  string  crawler 
work?  The  answer  wouid  consist  of  processing  the  graph  to  find  a 
traversal  of  part  of  it,  and  then  displaying  on  the  screen  images  from  the 
nodes  on  the  traversed  graph. 

6.  The  IBM  InfoWindow  system  which  we  had  was  an  XT-PC.  Rob  Favero 
designed  its  connection  to  the  IBM  RT. 

7.  Using  Composer/Conductor,  Jeffery  Weiss  implemented  on  the  XT  a 
stand  alone  string  crawler  presentation.  Input  was  via  touch  screen,  and 
it  was  menu  driven.  Or  top  of  this  presentation  we  added  the 
"intelligent"  part,  with  keyboard  input.  Thus  users  had  two  choices  for 
input:  touch  a  (verba!)  menu  label,  or  type  some  text.  The  latter  would 

be  analyzed  by  the  program,  and  a  response  would  be  given,  as  explained 
in  5  above.  The  response  was  either  a  sequence  of  images  from  the 
videodisc,  or  "Will  you  please  rephrase  your  query?"  This  second 
response  meant  that  the  program  was  unable  to  match  the  current  input 
to  any  graph  traversal. 


Experimental  work  and  araoh  modifications. 

About  30  people  (college  students)  tested  our  system,  and  based  on 
their  input  (both  computer  and  questionnaire)  we  modified  the  system. 

In  particular,  one  thing  we  noticed  was  that  our  original  front  end  which 
allowed  people  keyboard  access  was  not  very  good.  Based  on  input  from 
the  30  participants,  we  tried  to  update  and  improve  linguistic  access. 

I.e.,  we  wanted  fewer  "Can  you  rephrase  that?"  responses  from  our 
system,  and  more  presentations  of  helpful  information.  One  problem  that 
stayed  with  us  throughout  the  tutor's  development  was  that  people 
tended  to  avoid  keyboard  access  when  touch  screen  access  was  available. 
So  the  linguistic  material  from  the  30  people  was  quite  scanty  and  did 
not  give  us  much  to  work  with  in  improving  the  access.  We  will  come 
back  to  this  point  below. 

We  then  tested  150  people  (again  college  students),  25  in  each  of 
six  groups.  The  groups  differed  in  the  tasks  they  were  asked  to  perform; 
there  were  two  assembly  tasks  (one  with  extra  distractor  parts  present, 
and  one  with  no  distractor  parts)  and  two  repair  tasks  (A  and  B), 
differing  in  difficulty.  A  fifth  group  was  asked  to  prepare  for  a  test  on 
the  string  crawler  and  a  sixth  was  asked  to  find  bugs  in  the  system. 

After  completing  their  tasks,  they  were  given  a  questionnaire  which 
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asked  them  about  the  string  crawler's  functionality.  A  log  file  was 
automatically  created  as  each  person  used  the  system.  It  indicated  what 
key  or  touch  area  was  selected,  and  when.  As  mentioned  above,  people 
were  not  using  linguistic  access  very  much.  From  150  subjects  only  481 
different  words  (and  3642  total  words)  were  typed,  an  average  of  only 
24  per  person.  There  were  348  questions,  229  imperatives,  and  176 
keywords  or  key  phrases,  an  average  of  5  queries  per  person.  People 
were  not  very  inventive  or  varied  in  their  typing.  Further,  they  tended  to 
type  what  they  SAW  in  the  menu  labels  much  more  frequently  than  what 
they  HEARD  in  the  narration.  So  (after  moving  to  Michigan)  we  tested 
two  new  groups  (25  college  students  per  group)  who  were  allowed  no 
linguistic  access  but  touch  screen  only.  One  group  had  the  structured 
part  cf  the  tutor,  with  no  "intelligent"  part.  The  other  was  given  free 
unorganized  browsing,  with  no  structure.  This  last  group  could  "jump" 
from  one  part  of  the  videodisc  to  another,  by  touching  the  word  "jump," 
and  then  specifying  an  integer  to  indicate  how  many  "events"  ahead  (see 
belcw)  they  wished  to  jump.  Both  of  the  new  groups  did  repair  task  A. 

Several  types  of  data  analysis  were  performed  on  the  data  from  the 
eight  (six  old  and  two  new)  groups.  The  first  question  we  asked  was, 
how  similar  are  the  behaviors  of  people  in  the  different  groups?  In 
particular,  did  people  in  different  conditions  spend  similar  amounts  of 
time  viewing  the  same  parts  of  the  videodisc?  There  were  196  "events" 
in  the  presentation  (an  event  was  basically  one  or  more  pieces  of 
videodisc).  The  log  file  told  us  how  long  each  person  spent  in  each  event. 
We  assigned  to  each  person  a  1 96-eiement  vector,  each  element 
indicating  how  long  the  person  spent  in  that  event.  We  determined  the 
distance  between  any  two  people  in  a  group  (using  the  l_2-norm),  and  the 
distance  between  any  two  groups  (which  we  defined  as  the  average 
distance  between  any  two  people  in  the  two  groups).  For  each  group,  we 
calculated  its  closest  neighboring  group  (using  the  average  distance). 

We  did  a  cluster  analysis  on  these  data  and  found  only  one  main  cluster, 
containing  the  six  touch  screen  plus  linguistic  access  groups.  The  two 
new  groups,  who  used  only  the  touch  screen,  were  outliers,  even  though 
they  did  a  repair  task,  identical  to  the  task  performed  by  one  of  the  other 
six  groups.  This  result  indicates  that  behavior  on  our  system  depends 
more  on  the  environment  one  is  in  than  on  the  task  one  is  doing,  at  least 
in  terms  of  amount  of  time  spent  in  various  events. 

The  next  question  was,  how  varied  is  behavior  within  a  group? 

From  the  above  calculations  we  knew  the  average  distance  of  each  group 
member  from  every  other  member  of  his  or  her  group.  We  drew  a 
diameter  of  one  standard  deviation  around  each  group  (about  2/3  of 
members'  behaviors  fall  within  that  diameter).  Thus  the  diameter  gave 
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us  an  idea  of  how  dispersed  the  behavior  in  each  group  was.  We  found 
these  diameters  to  be  huge,  in  comparison  to  the  distances  between 
groups.  All  groups  overlapped  substantially  in  their  behaviors.  We 
interpreted  this  to  mean  that  individuals  were  extremely  varied  in  their 
behaviors,  even  when  they  were  working  on  the  same  task.  Each  person 
seemed  to  explore  a  particular  part  of  the  tutor,  and  the  part  explored 
was  fairly  unique  to  the  individual.  The  large  variety  of  behaviors  within 
a  group  we  found  quite  surprising,  since  the  tasks  were  the  same.  We  do 
not  attribute  the  different  behaviors  to  individual  differences.  Rather, 
we  think  their  behaviours  are  analogous  to  exploring  different  parts  of  a 
map:  people's  behavior  is  creating  a  map  (where  the  map  is  the  computer 
environment),  and  then  finding  a  solution  corresponding  to  a  specific 
task  based  on  the  map. 

To  determine  which  mode  of  input,  touch  screen  or  keyboard,  users 
in  the  six  groups  given  the  two  modes  preferred,  we  determined  for  each 
log  file  entry  whether  it  came  from  keyboard  or  touch  screen,  and  we 
summed  up  times  for  the  two  modes.  We  learned  that  in  all  six  of  the 
groups  in  which  keyboard  was  available,  subjects  spent  approximately 
75%  of  their  time  using  touch  screen.  And  more  than  10%  of  the  subjects 
never  used  the  keyboard  once.  Subjects  explained  on  post-questionnaires 
that  they  much  preferred  touching  to  typing.  Comments  were,  "The  touch 
screen  choices  tell  me  what  information  is  available,"  "I  didn’t  know 
what  to  type,"  "I  don't  have  to  think  when  I  touch,"  "Touching  is  easier 
than  typing,"  etc. 

The  success  rate  for  assembling  and  repairing  the  string  crawler 
was  close  to  100%  in  all  groups,  and  7  of  the  8  groups  spent  about  the 
same  time  wo'king  on  their  various  tasks  (between  33  and  39  min).  The 
group  given  unorganized  access  spent  significantly  more  time,  i.e.,  51 
min.  Percentage  c.  •  ect  on  the  string  crawler  test  of  understanding  was 
uniform  and  not  ■  .igh  (less  than  50%)  in  all  groups.  There  was 
basically  a  ceiliny  effect  on  the  assembly  and  repair  tasks,  and  not  much 
conceptual  understa  .g  (how  and  why  something  works).  In  hindsight 
we  should  have  eciected  a  more  complex  object  on  which  to  base  the 
tutor,  and  we  should  have  presented  more  conceptual  and  perhaps  less 
procedural  information. 

Positive  and  Negative  Aspects  of  the  Tutor,  and  the  Future  of  this 

Approach. 

One  positive  aspect  to  the  implemented  data  structure  was  its 
compactness:  A  fairly  small  data  structure  covered  a  large  amount  of 
processing.  But  the  data  structure  was  not  modular,  and  a  problem  arose 
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when  it  was  modified.  (Modification  was  done  in  order  to  accommodate 
additional  processing.)  Modification  created  unpredictable  side  effects 
that  influenced  other  processes  that  were  previously  constructed  and 
that  were  not  "protected"  from  such  side  effects.  For  example,  adding  an 
extra  feature  so  that  the  system  could  correctly  answer  the  question 
"how?"  changed  how  the  system  answered  the  question  "what?"  in  quite 
an  unpredictable  way.  So  this  created  a  problem  with  the  expandability 
of  the  system. 

Also,  independent  of  the  fact  that  the  linguistic  front  end  was  not 
done  as  correctly  as  it  should  have  been,  we  had  made  an  a  priori 
assumption  that  subjects  would  use  far  more  access  from  the  keyboard 
than  they  actually  did.  A  clear  finding  was  that  subjects  avoided  using 
the  keyboard  when  they  had  other  (touch  screen)  access.  Thus,  as 
mentioned  above,  the  textual  material  provided  by  subjects,  and  which 
we  used  to  design  access,  was  small  and  irregular.  Our  linguistic  front 
end  was  reacting  to  text  material,  and  it  could  not  handle  very  many 
queries. 

Is  there  any  future  to  this  type  of  data  structure?  To  review,  if  we 
set  up  process  one,  for  example,  to  answer  the  question  why,  this 
corresponds  to  some  traversal  of  the  graph.  Process  two  gets  another 
traversal  that  heavily  uses  the  same  nodes  as  process  one.  So  many 
processes  can  be  accommodated  on  the  same  graph.  But  now  suppose  a 
third  process  is  added,  and  it  requires  extra  links  and  perhaps  extra 
nodes.  The  unexpected  effect  was  that  process  one  or  two  would  find  a 
new  link  (path)  and  use  it.  There  were  two  effects  to  this: 

(a)  Repetitions  could  be  created.  That  is,  loops  could  be  created  in  the 
process:  some  material  could  be  processed  over  and  over  again.  This 
occurred  because  the  expansion  added  extra  links  and  nodes. 

(b)  Wrong  answers  could  be  created.  Not  finding  an  answer  corresponds 
on  our  system  to  finding  a  dead  end.  Suppose  process  one  when  first 
implemented  would  occasionally  (correctly)  respond,  "I  don't  know." 
Afterwards,  new  nodes  or  links  might  be  added,  so  that  process  one  now 
finds  new  routes  and  gives  spurious  answers. 

Could  (a)  and  (b)  be  avoided?  The  answer  is  yes,  and  relatively 
easily.  But  redoing  the  graph  would  require  reprogramming  the  whole 
data  structure.  Performance  obtained  on  the  basis  of  users'  selection  of 
visual  material  was  already  creating  a  ceiling  effect  (almost  everyone 
could  perform  his  or  her  task  perfectly),  so  changing  the  data  structure 
under  these  conditions  would  not  effect  a  measurable  difference.  For  the 
size  of  the  problem  which  we  undertook,  we  have  a  satisfactory  solution, 
without  an  "intelligent"  data  structure.  Namely,  we  have  one  very 
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specially  prepared  videodisc  that  covers  a  large  number  of  tasks.  How  it 
was  prepared  is  very  important.  A  presentation  using  touch  screen  only, 
with  no  data  structure  processing,  yields  as  good  performance  as  we 
could  get.  This  brings  us  to  one  general  observation:  The  performance  of 
an  elaborate  system  depends  on  the  amount  of  information  users  provide. 
With  both  touch  screen  and  keyboard  access,  users  do  not  provide  much 
information  from  the  keyboard. 

Phase  3.  Graphics-based  procedural  instructions 

Emblfimlaoal  and  approach. 

In  phase  two  we  had  learned  that  human  performance  and 
understanding  after  studying  well-designed  and  well-organized 
instructions  with  simplified  input  was  as  good  as  that  from  so-called 
"intelligent"  instructions.  In  phase  three,  carried  out  at  the  University 
of  Michigan,  we  investigated  the  role  of  organization  and  access  in 
well-designed  graphics  (rather  than  videodisc)  instructions.  Upon  my 
move  to  the  School  of  Education  at  the  University  of  Michigan  in 
September  1 987,  the  Office  of  Naval  Research  very  generously  allowed 
the  purchase  of  a  large  amount  of  equipment,  which  was  unavailable  at 
the  School,  and  much  needed.  Included  were  two  Macintosh  II 
workstations,  with  very  large  external  hard  discs  and  many  pieces  of 
software.  As  described  below,  this  was  the  equipment  used  for  phase 
three. 


In  earlier  work  we  had  learned  that  a  good  (i.e.,  "typical") 
organization  in  passive  (videotape)  instruction  leads  to  better 
performance  than  a  "minority"  conceptualization  (Baggett  &  Ehrenfeucht, 
1988).  The  question  we  asked  in  this  study  was,  how  important  is  a  good 
organization  when  the  instructions  are  interactive  rather  than  passive? 
Suppose  we  shuffle  up  the  underlying  organization  (i.e.,  the  sequence  that 
one  gets  when  one  selects  forward  arrows),  but  make  sure  that  a  user 
can  get  from  one  piece  of  information  to  another  in  a  short  number  of 
moves  (choices).  Will  the  user  then  have  more  difficulty  (or  less 
success)  than  if  the  presentation's  underlying  structure  is  well 
organized? 

At  the  end  of  this  report  can  be  found  a  technical  report  describing 
the  study.  Only  a  very  brief  summary  is  given  here. 

Implementation. 


The  object  was  the  string  crawler  used  in  phase  2.  We  had  derived 
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its  "typical"  tree  structure  (division  into  subassemblies)  in  phase  2,  and 
a  graphics  frame  (with  animation)  for  each  leaf  and  node  in  the  tree  (34 
in  all)  was  prepared,  using  the  Course  of  Action  authoring  language  for 
the  Macintosh  II.  We  selected  three  organizations  (sequences)  of  the 
frames.  The  first  ordering  was  such  that, when  one  viewed  it  from  first 
to  last  (clicking  on  forward  arrows  using  a  mouse),  one  would  observe  a 
correctly-  (and  typically-)  built  string  crawler.  The  second  ordering 
gave  a  randomly  shuffled  sequence  when  viewed  from  first  to  last.  And 
the  third  grouped  together  frames  that  had  parts  of  the  string  crawler  in 
common,  even  though  the  sequencing  was  otherwise  not  meaningful  (see 
the  discussion  of  visual  cohesion  in  the  technical  report). 

In  addition  to  access  via  forward  arrows,  one  could  select  one  or 
more  objects  in  a  frame  (indicated  by  stars),  and  one  would  go  to  the 
next  frame  which  contained  that  object.  We  termed  this  access 
"hypergraphics,"  rather  than  hypertext. 

Experimental  work. 


Ninety-six  subjects  were  tested,  16  males  and  16  females  in  each 
of  three  groups,  each  group  given  different  instructions.  Each  person's 
task  was  to  repair  the  string  crawler, which  was  identically  broken  for 
all  groups.  The  basic  questions  were:  (1)  How  does  access  (use  of 
forward  and  backward  arrows  versus  use  of  stars)  depend  on  the 
underlying  sequence?  (2)  How  does  underlying  sequence  influence 
performance?  We  thought  that  access  choice  would  vary  as  a  function  of 
underlying  sequence:  people  given  random  orders  would  select  more 
stars,  while  people  given  the  typical  order  would  select  more 
forward/backward  arrows.  We  also  thought  meaningful  sequences  would 
lead  to  better  performance  than  random  sequences. 

Bssults/Discussion, 

The  surprising  finding  was  that  there  are  no  differences  among  the 
groups  in  either  use  of  access  or  performance  on  the  repair  task.  We 
offer  a  post  hoc  explanation:  the  instructions  were  interactive,  and 
therefore  users  held  no  expectations  about  organization.  What  was 
important  was  short  access  (i.e.,  a  small  number  of  choices  gets  the  user 
from  where  he  or  she  is  to  anywhere  he  or  she  wants  to  go).  We  were 
able  to  calculate  the  shortest  average  distance  (number  of  choices)  from 
frame  i  to  frame  j  for  each  of  the  presentations.  We  found  it  to  be  5.6 
(typical  sequence);  2.8  (random  sequence);  and  3.35  (sequence  grouped  by 
visual  cohesion).  Our  observation  is  that  perhaps  deep  hierarchical 
menus  are  not  the  best  way  to  design  access.  Perhaps  all  that  is  needed 


is  short  access. 
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Final  Comments  on  the  Project. 

The  work  done  on  this  project  covered  several  areas.  In  the  first 
phase,  we  rather  accidentally  began  looking  at  the  role  of  practice  in 
procedural  instruction,  as  the  result  that  practice  does  not  always  help 
and  sometimes  hurts  refused  to  go  away.  The  research  has  led  us  to 
begin  to  examine  more  closely  the  phenomenon  of  practice.  In  my  lab  at 
Michigan  we  are  almost  ready  to  begin  testing  subjects  to  examine  one 
issue  involving  practice.  We  hypothesize  that  when  one  practices,  the 
concepts  that  one  is  forming  become  somewhat  "frozen"  and  less 
modifiable  than  when  one  does  not  practice.  The  question  we  are  looking 
at  is,  is  a  motoric  component  (actual  actions  in  the  real  world) 
necessary  to  get  this  "freezing"?  We  consider  the  question  to  be 
fundamental,  and  it  is  a  key  question  in  our  evolving  theoretical 
framework  of  learning  and  memory.  In  a  nutshell,  we  are  looking  at 
learning  as  having  two  components,  understanding  and  skill  aquisition 
(coming  through  practice).  As  mentioned  above,  we  consider 
understanding  (analogous  to  forming  an  algorithm)  to  be  a  cognitive 
process,  and  skill  acquisition  (analogous  to  executing  an  algorithm)  to  be 
a  non-cognitive  process.  We  are  currently  investigating  this  distinction 
for  mathematics  education. 

In  phase  two  we  learned  that  developing  a  so-called  intelligent 
multimedia  tutoring  system  is  indeed  difficult.  As  discussed  above, 
there  were  problems  (although  not  necessarily  insurmountable)  with  the 
data  structure  and  its  processing,  and  with  linguistic  access.  In 
addition,  the  well  organized  non-intelligent  part  of  our  tutor  led  to 
performance  that  was  as  good  as  that  obtained  when  the  intelligent  part 
was  added.  This  may  have  been  because  the  object  (string  crawler)  was 
not  complex  enough,  or  because  of  the  linguistic  access  and  processing 
problems.  But  it  also  could  indicate  that  well  organized  information  is 
sufficient  for  (adult)  humans,  and  that  a  so-called  intelligent  part  is  not 
necessary.  After  coming  to  Ann  Arbor,  I  did  not  have  the  necessary 
computer  science  connections  to  continue  in  any  large  fashion  the 
videodisc  tutor's  development  (although  we  were  able  to  get  it  up  and 
running,  to  develop  one  new  videodisc  implementation,  and  to  test  50 
students  using  it).  Thus  I  consider  that  the  question  of  whether 
"intelligence"  is  necessary  in  procedural  instruction  is  still  open. 

Phase  three  was  our  first  experiment  using  graphics  and  animation 
(rather  than  videodisc)  on  the  Macintosh  II.  Its  questions  were  not  very 
theoretical,  but  were  definitely  practical.  And  its  main  result,  that 
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organization  doesn't  matter  for  the  interactive  instructions  we 
developed,  was  surprising  and  deserves  further  investigation.  A 
comment  is  also  in  order  regarding  comparing  videodisc  versus  graphics 
instruction,  which  we  can  of  course  do  only  to  a  very  limited  extent  here. 
Looking  at  the  post-questionnaire  data  from  the  graphics  experiment  we 
learned  that  people  were  confused  by  the  two-dimensional  drawings  of 
the  three-dimensional  string  crawler  parts,  and  that  sometimes  they 
could  not  discern  their  orientation,  or  what  was  behind  what.  (We  note 
that  the  graphics  were  done  by  a  professional  graphics  artist;  examples 
can  be  seen  at  the  end  of  the  attached  technical  report.)  These 
confusions  did  not  occur  with  video  images.  There  was  a  particular 
graphic  in  both  video  and  graphic  presentations  which  was  a  schematic 
diagram  showing  the  string  crawler’s  wiring.  People  in  all  groups  who 
found  this  frame  spent  a  lot  of  time  viewing  it.  This  brings  up  the 
question  of  how  to  design  graphics  to  get  across  different  kinds  of 
information,  for  both  building  concepts  and  executing  procedures. 
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Figure  Caption 

Figure  1 .  Design  of  interactive  instructions  for  the  lift  in  phase  1 . 

Arrows  indicate  options  available  to  the  user  via  touches  to  labels 
on  the  touch  screen.  Touching  "next"  took  user  to  next  unit.  "Short 
replay"  replayed  unit  just  viewed.  "Long  replay"  replayed  previous 
two  units.  "Extra-long  replay"  replayed  entire  subassembly.  And 
"Replay  whole  presentation"  replayed  from  the  beginning. 
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Abstract 

When  procedural  instructions  are  presented  noninteractively  and  are 
structured  well  (i.e.,  in  a  "typical"  way),  people  can  perform  the  procedure 
better  later  from  memory  than  when  the  instructions  are  presented 
atypically.  The  question  in  this  study  was,  what  is  the  role  of  organization 
(sequencing)  when  the  instructions  are  presented  interactively,  so  that 
people  can  choose  their  own  paths  through  the  material?  Using  computer 
graphics  and  animation,  we  designed  three  sets  of  instructions  showing 
the  assembly  of  a  40-piece  object  made  from  an  assembly  kit.  For  the 
first  instruction  set,  access  by  forward  arrows  gave  the  "typical" 
sequence.  For  the  second,  access  by  forward  arrows  gave  a  random 
sequence;  and  for  the  third,  such  access  put  together  information  that  was 
similar  in  terms  of  visual  elements  in  common  ("visual  cohesion"),  but  not 
in  terms  of  meaningful  organized  sequencing.  Another  kind  of  access  was 
also  available:  one  could  click  on  an  object  and  would  go  to  the  next 
"frame"  which  contained  that  object.  We  expected  that  people  in  the  three 
groups,  given  a  repair  task,  would  differ  in  their  performance  on  the  task 
and  in  their  use  of  access.  Surprisingly,  we  found  no  differences.  We  offer 
a  post  hoc  explanation:  when  instructions  are  interactive,  organization 
does  not  matter,  but  access  does.  As  long  as  one  provides,  as  we  did,  a 
short  path  between  any  two  nodes  (in  our  case,  "frames,")  the  rest  is  not  so 
important.  This  could  mean  that  elaborate  hierarchical  menus  are  not  the 
best  way  to  design  access. 
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Introduction. 

A  recurring  theme  in  some  of  our  earlier  work  is  the  strong  role  of 
organization  in  procedural  instructions  (Baggett  &  Ehrenfeucht,  1988). 
When  instructions  are  presented  noninteractively  via  videotape  and  are 
structured  well  (i.e.,  in  a  typical  or  natural  way),  individuals  can  perform 
the  procedure  better  later  from  memory  than  when  instructions  are 
structured  in  a  less  typical  way.  Thus  an  old  finding  is  that  if  the  order  of 
learning  is  strictly  enforced  by  a  presentation,  then  sequencing  plays  an 
essential  role.  In  the  1 988  study,  the  procedure  was  to  build  an  80-piece 
object,  a  lift,  made  from  pieces  in  the  Fischer-Technik  assembly  kit. 

The  question  in  this  study  is  whether  sequencing  is  important  when 
subjects  are  provided  with  well  designed  free  access  to  the  information, 
i.e.,  when  they  can  pretty  much  do  what  they  want,  viewing  the  information 
in  many  different  orders.  The  object  used,  a  string  crawler  made  from  40 
pieces  in  the  Capsela  assembly  kit,  is  shown  in  Figure  1 .  Figure  2  shows  a 
possible  tree  structure  for  the  string  crawler.  This  structure  has  been 
determined  to  be  the  "typical"  one,  using  techniques  from  Baggett  & 
Ehrenfeucht,  1 988.  Following  along  the  leaves  of  the  tree  from  left  to 
right,  taking  the  named  parts  and  assembling  them,  one  gets  a  correctly 
built  string  crawler.  This  sequence,  if  enforced  in  a  noninteractive 
presentation,  should  lead  to  better  performance  from  memory  than  some 
other  atypical  sequence. 

We  first  explain  the  notion  of  visual  cohesion  and  the  design  of  the 
presentations  and  their  implementation,  followed  by  the  experimental 
procedure.  We  then  present  the  rather  surprising  conclusions  and  a 
hypothetical  explanation  which,  if  correct,  can  have  useful  practical 
consequences. 
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Visual  Cohesion. 

Besides  the  typical  sequence,  we  chose  four  other  sequences  to  test 
as  well,  based  originally  on  notions  of  text  coherence  (Kintsch  &Vipond, 

1977;  Kintsch  &  vanDijk,  1978)  and  text  cohesion  (Halliday  &  Hasan,  1976). 
Briefly,  a  text  base  in  Kintsch's  sense  is  coherent  if  it  is  connected  by 
argument  repetition.  For  Halliday  et  al.,  cohesion  occurs  through  word 
repetition,  a  noun  and  its  pronoun  referent,  use  of  synonyms,  etc.  We 
(Baggett  &  Ehrenfeucht,  1982)  extended  these  notions  to  visual  cohesion 
and  quantified  the  visual  cohesion  in  a  pictorial  sequence  taken  from  a 
movie.  In  this  study  we  use  the  same  techniques.  We  prepared  41  frames, 
to  correspond  with  the  leaves  and  higher  nodes  of  the  tree  in  Figure  2.  A 
cohesion  graph  of  these  41  frames,  in  their  typical  order,  is  shown  in 
Figure  3.  The  cohesion  graph  is  formed  as  follows.  The  frames  contain 
computer  graphics  of  string  crawler  parts.  We  selected  21  elements 
(parts  of  the  string  crawler)  occurring  in  the  frames  to  include  in  the 
cohesion  analysis.  For  each  element,  we  counted  the  number  of  times  it 
occurred  in  adjacent  frames.  For  example,  element  1  (a  motor)  occurred  in 
frames  1 , 2,  3,  5,  7,  30-34,  37,  38,  and  40.  Its  number  of  adjacencies  was 
thus  7  (for  occurring  in  adjacent  photos  1-2,  2-3,  30-31 , 31-32,  32-33, 

33-34,  and  37-38).  We  summed  the  number  of  adjacencies  over  all 
elements  to  determine  a  cohesion  value  for  the  sequence.  The  frames  in 
their  typical  order  have  a  cohesion  value  of  96. 

We  chose  the  four  other  sequences  as  follows.  First,  the  frames 
were  randomly  ordered  1000  times,  and  the  mean  cohesion  value  was  found 
to  be  45.  So  we  selected  two  orders  with  value  45.  Then,  using 
"hill-climbing"  techniques,  again  1000  times,  we  constructed  random 
sequences  with  high  cohesion.  They  had  a  mean  cohesion  value  of  131 .  (We 
selected  two  orders,  rather  than  one,  to  decrease  any  idiosyncratic  effects 
accidentally  arising  from  one  sequence.)  Randomization  of  the  orders  was 
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meant  to  destroy  any  meaningful  sequencing.  Because  we  found  no 
differences  for  groups  receiving  the  two  random  sequences,  and  also  none 
for  groups  receiving  the  two  hill-climoing  sequences,  the  data  in  each  case 
are  combined.  Thus  there  are  3  groups  reported:  1  (typical),  2  (random),  and 
3  (hill-climbing).  Each  adjacent  pair  in  group  1  has,  on  the  average,  96/40 
*  2.4  elements  in  common.  The  numbers  are  45/40  =  1.13  for  group  2,  and 
131/40  =  3.28  for  group  3. 

Design  of  Materials. 

The  presentation  was  implemented  on  a  Macintosh  II  using  the  Course 
of  Action  authoring  language.  Access  was  via  mouse  clicks.  Options 
available  in  each  frame  were  as  follows: 

1  Forward  arrow.  Goes  to  "next"  frame  around  a  "clock  face." 

2.  Backward  arrow.  Goes  back  one  frame  around  the  clock  face. 

3.  A  star  by  an  object.  Goes  to  next  frame  around  clock  face  that  contains 

the  object.  (This  is  access  by  visual  cohesion.) 

4.  Previous  button.  A  stack  is  kept,  and  "previous"  goes  in  order  to 

elements  on  the  stack,  allowing  the  user  to  retrace  steps. 

5.  Exit  button.  Asks,  "Do  you  really  want  to  exit?";  exits  if  user  clicks  yes, 

and  takes  user  back  if  user  clicks  no. 

6.  Activity  buttons.  Cause  animation  such  as  assembly  of  parts  shown. 

It  is  important  to  remember  that  the  only  item  varied  in  the  five 
presentations  was  the  sequencing,  i.e.,  what  happens  when  the  user 
selects  the  forward  and  backward  arrows.  Access  via  stars  gave  the  same 
results  in  all  groups. 

Basic  questions. 

In  the  experiment,  subjects  were  given  a  broken  string  crawler,  told 
that  it  doesn't  work,  and  asked  to  fix  it.  They  were  also  told  that  the 
presentation  contained  information  that  would  allow  them  to  fix  it.  The 
basic  questions  were: 
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•  How  does  access  (use  of  forward  and  backward  arrow  versus  use  of 
stars)  depend  on  order? 

•  How  does  order  influence  performance? 

Our  original  hypotheses  were: 

1.  Meaningful  sequences  will  lead  to  better  performance  than  random  ones. 
(Group  1 ,  given  the  typical  sequence,  should  outperform  group  2,  given  the 
random  sequences.) 

2.  People  given  not  meaningful,  but  cohesive,  sequences  (group  3,  given 
sequences  determined  by  hillclimbing)  will  follow  the  forward  and 
backward  arrows  for  access,  while  people  given  random  sequences  (group 
2)  will  follow  stars. 

Thus  we  predicted  that  access  choices  would  vary  as  a  function  of 
sequence. 

Implementation. 

The  drawings  are  two-diminsional,  black-and-white,  with  no  hands. 
Names  of  parts  and  subassemblies,  derived  using  techniques  in  Baggett, 
Ehrenfeucht,  and  Perry  (1986),  are  printed  on  the  frames.  The  interface  is 
meant  to  be  obvious  or  invisible,  i.e.,  no  training  is  required  in  order  to  use 
it.  The  authoring  language  used,  Course  of  Action,  is  similar  to  HyperCard. 
Logfiles  were  kept  of  each  user  action,  for  later  data  analysis. 

Frames  0  and  1  are  the  same  for  all  groups.  Frames  0, 1 , 2,  3,  and  4 
for  group  1  are  shown  in  Figure  4.1 .  Frames  2  through  4  for  one  of  the 
random  groups  are  shown  in  Figure  4.2.  And  frames  2  through  4  for  one  of 
the  hill-climbing  groups  are  in  Figure  4.3. 

Dependent  Measures. 

There  were  three  kinds  of  dependent  measures: 

1 .  Using  the  system.  Proportion  of  forward,  backward,  previous,  and 
click-on-star  choices;  total  time  on  system;  mean  time  in  frame;  average 
number  of  frame  visits;  average  length  of  a  string  of  "previous"  choices; 
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average  number  of  activity  buttons  selected. 

2.  Performing  the  task.  Does  the  string  crawler  function?  Did  the  user  fix 
what  was  wrong  with  it? 

3.  User  satisfaction  (post  questionnaire).  Questions  on  screen  design  and 
options,  getting  lost,  and  subjective  satisfaction. 

Subjects.  Ninety-six  college  students  served  as  subjects,  16  males  and 
16  females  in  each  of  three  groups.  They  were  paid  $5.00  per  hour  for 
participating. 

Results  and  Discussion. 

Results  on  use  of  the  system  are  given  in  Table  1 .  Figure  5  shows  the 
percentage  of  forward,  backward,  previous,  and  click  on  stars  choices  by 
group.  The  surprising  thing  to  notice  from  both  Table  1  and  Figure  5  is  that 
there  are  no  differences  among  the  groups.  Figure  6  shows  the  time  spent 
in  a  frame,  by  group  and  by  frame,  with  the  order  of  time  in  frame 
decreasing.  It  shows  that  there  are  a  few  frames  which  people  spend  lots^ 
of  time  visiting  (and  that  these  are  the  same  for  all  groups).  The  rest  of 
the  frames  are  visited  only  briefly.  The  three  most  frequently  visited 
frames  are  the  first  one  (the  whole  string  crawler  shown  in  Figure  5.1), 
frame  38  (frame  3  in  the  hill-climbing  sequence  of  Figure  5.3),  and  frame 
30  (frame  2  in  the  hill-climbing  sequence  of  Figure  5.3). 

Table  2  gives  the  scores  on  performance  (whether  the  string  crawler 
works,  and  whether  all  the  repairs  were  made  correctly)  by  group.  Again, 
the  three  groups  perform  essentially  identically,  and  almost  perfectly. 

There  were  no  gender  differences  in  performance. 

Table  3  gives  the  ratings  on  the  postquestionnaire,  with  0  being 
worst  and  1 0  being  best.  Again,  there  were  no  group  differences.  The 
sequencing  was  not  particularly  clear  for  any  group,  but  getting  back  was 
easy.  Overall,  the  screen  design  was  judged  fairly  good,  but  all  groups 
wanted  more  stars  to  click  on.  And  overall  the  task  was  fun  and 
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satisfying. 

Interpretation. 

We  expected  that  the  underlying  organization  in  interactive 
instructions  would  make  a  difference  in  use  of  different  kinds  of  access 
and  in  performance  of  the  task.  Namely,  we  expected  that  people  with 
randomized  orders  would  not  use  forward  and  backward  arrows  very  much, 
because  of  the  lack  of  logical  connections  between  consecutive  frames. 

Or,  if  they  did,  we  thought  it  would  significantly  decrease  the  quality  of 
their  performance.  But  it  did  neither.  Why?  We  offer  the  following  post 
hoc  interpretation.  The  instructions  were  interactive,  and  therefore 
subjects  had  no  expectations  about  organization.  In  interactive 
instructions,  organization  does  not  matter,  but  access  does.  Our  current 
hypothesis  is  that,  as  long  as  one  provides  a  short  path  between  any  two 
nodes  (in  this  case,  frames),  the  rest  is  not  so  important.  A  theorem  by 
Bollobas  (1985,  p.  241)  says  that  for  most  graphs  of  n  nodes,  in  which  from 
each  node  one  can  go  to  k  others,  the  shortest  average  distance  is 
approximately  log^.f  n.  Here,  n=41 .  We  calculated  the  shortest  average 

distance  from  frame  i  to  frame  j  for  the  presentations  for  groups  1 , 2  (a 
and  b),  and  3  (a  and  b).  They  were:  group  1,  5.6;  groups  2  (a  and  b),  2.8;  and 
groups  3  (a  and  b),  3.35.  In  actuality,  group  1 ,  with  the  typical 
organization,  actually  had  the  longest  average  path  length  of  all  of  the 
groups,  and  groups  2  (randomly  sequenced)  had  the  shortest! 

We  note  that  the  Bollobas  theorem  says  that  if  there  are  1 1  links 
from  each  node,  and  1 ,000,000  nodes,  the  shortest  average  distance  is 
approximately  logig  1 ,000,000  ■  6.  Thus  path  lengths  grow  very  slowly 

compared  to  number  of  nodes. 

Conclusions. 

We  started  this  paper  by  noting  that  organization  is  important  when 
the  order  of  learning  is  strictly  enforced  by  a  presentation.  But  the 
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results  of  this  study  indicate  that  organization  does  not  seem  to  be 
important  when  well  designed  free  access  to  the  information  is  provided, 
i.e.,  (hypothesis)  as  long  as  there  is  a  short  path  between  any  two  nodes. 
This  could  mean  that  elaborate,  deep  hierarchical  menus  are  not  the  best 
way  to  design  access. 
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Table  1 

Use  of  System 

(There  were  16  males  and  16  females  in  each  group.) 


Group  1 
(typical) 

Group  2 
(random) 

Group  3 
(hill-climbing) 

mean  time  on  system 
(minutes) 

30.8 

32.7 

36.6 

mean  time  in  frame 
(seconds) 

19.1 

20.4 

19.3 

mean  number  of 
frame  visits 

95.7 

97.4 

113.3 

average  length  of 
"previous"  string 

2.4 

2.3 

2.2 

average  number  of 
times  activity 
buttons  selected 

19.0 

19.7 

29.1 

Table  2 

Performance  Scores  by  Group 

1  2  3 

(typical)  (random)  (hillclimbing) 


functionality  of  string  crawler  17.0  16.9 

(20  points  possible) 


corrections  made 
(35  points  possible) 


32.0 


32.6 


32.2 
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Table  3 

Postquestionnaire  Measures 
(0=worst;  10=best) 


(typical) 

(random) 

(hillclimbing) 

sequence 

confusing/clear 

4.9 

3.6 

4.4 

next  screen 

predictable/unpredictable 

4.6 

3.8 

4.4 

maintain  a  sense  of 
where  you  are 

4.8 

4.9 

5.3 

getting  lost 
(10=didn'tget  lost) 

5.7 

6.0 

6.1 

getting  back 

8.4 

8.4 

9.0 

clickable  stars 
hard/easy  to  find 

8.3 

8.0 

7.9 

discern  orientation  of  parts 

6.2 

5.7 

5.6 

layouts 

cluttered/orderly 

7.6 

6.9 

6.9  . 

number  of  click  options 

0=too  few;  1 0=too  many 

2.3 

3.0 

3.9 

hard  work/fun 

7.9 

8.0 

7.3 

%  Choices  by  Qioup  and  Aecaaa 
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Appendix 


This  appendix  contains  papers,  talks,  and  technical  reports  on  the  Office  of 

Naval  Research  Contract,  Designing  and  Implementing  an  Intelligent 

Multimedia  Tutoring  System  for  Repair  Tasks  (N00014-85-K-0060). 

Papers. 

Baggett,  P.  Mixing  verbal,  visual,  and  motoric  elements  in  instruction: 
What’s  good  and  what's  not?  Proceedings.  International  Visual 
Literacy  Association ,  Twentieth  Annual  Conference,  in  press. 

Baggett,  P.  &  Ehrenfeucht,  A.  Textual  and  visual  access  to  a  computer 
system  by  people  who  know  nothing  about  it.  Proceedings.  Sixth 
International  Conference  on  Systems  Documentation.  Association  of 
Computing  Machinery,  in  press. 

Baggett,  P.  The  role  of  practice  in  videodisc-based  procedural 

instructions.  IEEE  Transactions  on  Systems.  Man,  and  Cybernetics, 
special  issue  on  human-computer  interaction  and  cognitive 
engineering,  vol.  18  (4),  487-496, 1988. 

Baggett,  P.  &  Ehrenfeucht,  A.  Conceptualizing  in  assembly  tasks.  Human 
Factors.  30  (3),  269-284, 1988. 

Baggett,  P.  Learning  a  procedure  from  multimedia  instructions:  the  effects 
of  film  and  practice.  Applied  Cognitive  Psychology,  vol.  1 , 183-195, 
1987. 

Baggett,  P.,  Ehrenfeucht,  A.,  &  Hanna,  J.  Implementing  a  multimedia 
knowledge  representation  for  interactive  procedural  instructions. 
Proceedings.  Second  Annual  Rockv  Mountain  Conference  on  Artificial 
Intelligence,  99-113, 1987. 

Baggett,  P.  Interactive  vs.  passive  multimedia  instructions.  Proceedings. 
lEEEJnternational  Conference  on  Systems.  Man,  and  Cybernetics. 
1070-1075, 1986. 

Baggett,  P.,  Ehrenfeucht,  A.,  and  Perry,  R.  A  technique  for  designing 

computer  access  and  selecting  good  terminology.  Proceedings.  First 
Annual  Rockv  Mountain  Conference  on  Artificial  Intelligence. 

167-179,  1986. 


Baggett,  P.  Interactive  instructions  for  procedural  tasks.  Proceedings 


33-37,  1985. 


Talks. 

Baggett,  P.  &  Guzdial,  M.  Organization  and  access  in  graphics-based 

procedural  instructions.  Michigan  Association  for  Computer  Users  in 
Learning,  Detroit,  April  1989. 

Baggett,  P.  &  Ehrenfeucht,  A.  The  role  of  calculators  and  computers  in 

mathematics  education.  American  Educational  Research  Association 
Annual  Meeting,  San  Francisco,  March  1989. 

Baggett,  P.  Learning  and  practicing  procedures.  Office  of  Naval  Research 
Contractors'  Meeting  on  Intelligent  computer  aided  instruction, 

Orlando,  March  1989. 

Baggett,  P.  &  Ehrenfeucht,  A.  What  is  the  role  of  practice  in  cognition? 
Twenty-ninth  annual  meeting,  Psychonomic  Society,  Chicago, 
November  1988. 

Baggett,  P.  &  Ehrenfeucht,  A.  Textual  and  visual  access  to  a  computer 
system  by  people  who  know  nothing  about  it.  Sixth  International 
Conference  on  Systems  Documentation,  ACM,  Ann  Arbor,  Ml,  October 
1988  (invited). 

Baggett,  P.  Mixing  verbal,  visual,  and  motoric  elements  in  instruction: 

What's  good  and  what's  not?  Twentieth  Annual  Conference, 
International  Visual  Literacy  Association,  Blacksburg,  VA,  October 
1988  (invited). 

Baggett,  P.  Using  computers  intelligently  in  education,  and  Using 

interactive  videodisc:  A  surprising  finding.  Virginia  Association  of 
College  Teachers  of  Education  annual  fall  retreat,  Richmond,  VA, 

Sept.  1988  (invited). 

Baggett,  P.  Promises  and  problems  of  computers  in  education.  Florida 

Model  Schools  Consortium,  Living  Seas,  EPCOT,  Orlando,  September 
1988  (invited). 

Baggett,  P.  The  role  of  practice  in  videodisc-based  procedural 

instructions,  and  Learning  in  multimedia  instructional  environments. 


Seminar  for  Deans  of  Schools  of  Education,  Ann  Arbor,  June  1988 
(invited). 

Baggett,  P.  Designing,  implementing,  and  using  a  multimedia  tutoring 

system  for  procedures.  American  Educational  Research  Association 
annual  meeting,  New  Orleans,  April  1988. 

Baggett,  P.  How  people  maneuver  in  multimedia  instructional 

environments.  Michigan  Association  for  Computer  Users  in  Learning, 
Grand  Rapids,  March  1988. 

Baggett,  P.  Multimedia  instructional  environments.  Office  of  Naval 

Research  Contractors'  Meeting  on  Advanced  Educational  Systems, 
Pittsburg,  March  1988. 

Baggett,  P.  How  people  use  a  multimedia  computer  system  ior  interactive 
procedural  instructions,  Psychonomic  Society,  Twenty-eighth  annual 
meeting,  Seattle,  November  1 987. 

Baggett,  P.,  Ehrenfeucht,  A.,  &  Hanna,  J.  Implementing  a  multimedia 
knowledge  representation  for  interactive  procedural  instructions, 
Second  Annual  Rocky  Mountain  Conference  on  Artificial  Intelligence, 
June  1987. 

Baggett,  P.  Mixing  practice  with  interactive  procedural  instructions  does 
more  harm  than  good.  Cognitive  Research  Group,  Princeton 
University  (invited),  March  1987. 

Baggett,  P.  Putting  visual  material  into  a  database  for  a  so-called 

intelligent  tutor:  problems  and  solutions.  Office  of  Naval  Research 
Contractors'  Meeting,  Yale  University,  March  1987. 

Baggett,  P.  Learning  procedures  from  interactive  videodisc  vs.  passive 
video.  Psychonomic  Society  Twenty-seventh  Annual  Meeting,  New 
Orleans,  November  1 986. 

Baggett,  P.  Interactive  vs.  passive  multimedia  instructions.  Symposium  on 
Human-Computer  Interaction  and  Cognitive  Engineering.  1 986  IEEE 
Conference  on  Systems,  Man,  &  Cybernetics,  Atlanta,  October  1 986. 
(invited) 

Baggett,  P.  &  Ehrenfeucht,  A.  Developing  a  so-called  intelligent  tutor 

using  videodisc.  Advanced  Educational  Systems  Group,  IBM-Atlanta, 
October  1986  (invited). 
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Baggett,  P.  Motoric,  visual,  and  iinguistic  concepts:  their  formation  and 
integration  in  memory.  Symposium  on  Acquiring  Knowledge  from 
Text-Picture  Interactions.  Tubingen,  West  Germany,  July  1986 
(invited). 

Baggett,  P,  Ehrenfeucht,  A.,  &  Perry,  R.  A  technique  for  designing 

computer  access  and  selecting  good  terminology.  First  Annual  Rocky 
Mountain  Conference  on  Artificial  Intelligence.  Boulder,  CO.  June 
1986. 

Baggett,  P.  An  interactive  multimedia  tutor  for  procedural  tasks. 

American  Association  for  the  Advancement  of  Science  - 
Southwestern  and  Rocky  Mountain  Division.  Symposium  on  Cognitive 
Science:  Theory,  Methodology,  and  Application  to  Human-Computer 
Interaction.  Boulder,  Co.  April  1986  (invited). 

Baggett,  P.  Developing  a  multimedia  tutor  for  procedures.  ONR 

Contractors'  Meeting  on  Advanced  Technology.  Xerox  PARC,  Palo 
Alto,  CA,  March  1986. 

Baggett,  P  &  Ehrenfeucht,  A.  How  people  find  information  in  a  computer 
environment.  Psychonomic  Society  Twenty-sixth  annual  meeting, 
Boston,  November  1985. 

Baggett,  P.  Interactive  instructions  for  procedural  tasks.  IEEE 

International  Conference  on  Systems,  Man,  and  Cybernetics,  Tucson, 
November  1985  (invited). 

Baggett,  P.  Multimedia  cognitive  engineering.  Conference  on  Applications 
of  Microcomputers  to  Problems  of  Cognitive  Psychology,  Education 
and  Communication.  UCLA,  September  1985  (invited). 

Baggett,  P.  Issues  in  interactive  instructions.  Army  Research  Institute. 
Alexandria,  VA,  June  1985  (invited). 

Baggett,  P.  A  multimedia  knowledge  representation  for  an  "intelligent" 

computerized  tutor.  ONR  Contractors’  Meeting  on  Advanced  Training 
Systems,  Atlanta,  January  1985. 

Technical  Reports. 

Baggett,  P.,  Ehrenfeucht,  A.,  &  Guzdial,  M.  Sequencing  and  access  in 
interactive  graphics-based  procedural  instructions.  University  of 


Michigan  Educational  Technology  Technical  Report,  vol.  2  (1), 
September  1989. 

Baggett,  P.  The  role  of  practice  in  videodisc-based  procedural 

instructions.  University  of  Michigan  Educational  Technology 
Technical  Report,  vol.  1  (1),  May  1988. 

Baggett,  P.  &  Ehrenfeucht,  A.  A  multimedia  knowledge  representation  for 
an  "intelligent"  computerized  tutor.  Institute  of  Cognitive  Science 
Technical  Report  No.  142,  University  of  Colorado,  April  1985. 
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